57 research outputs found
A Bulk-Parallel Priority Queue in External Memory with STXXL
We propose the design and an implementation of a bulk-parallel external
memory priority queue to take advantage of both shared-memory parallelism and
high external memory transfer speeds to parallel disks. To achieve higher
performance by decoupling item insertions and extractions, we offer two
parallelization interfaces: one using "bulk" sequences, the other by defining
"limit" items. In the design, we discuss how to parallelize insertions using
multiple heaps, and how to calculate a dynamic prediction sequence to prefetch
blocks and apply parallel multiway merge for extraction. Our experimental
results show that in the selected benchmarks the priority queue reaches 75% of
the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the
speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape
Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries
Longest common extension queries (LCE queries) and runs are ubiquitous in
algorithmic stringology. Linear-time algorithms computing runs and
preprocessing for constant-time LCE queries have been known for over a decade.
However, these algorithms assume a linearly-sortable integer alphabet. A recent
breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the
two notions: all the runs in a string can be computed via a linear number of
LCE queries. The first to consider these problems over a general ordered
alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an
-time algorithm for answering LCE queries. This
result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to time. In this work we note a special \emph{non-crossing} property
of LCE queries asked in the runs computation. We show that any such
non-crossing queries can be answered on-line in time, which
yields an -time algorithm for computing runs
Succinct Data Structures for Families of Interval Graphs
We consider the problem of designing succinct data structures for interval
graphs with vertices while supporting degree, adjacency, neighborhood and
shortest path queries in optimal time in the -bit word RAM
model. The degree query reports the number of incident edges to a given vertex
in constant time, the adjacency query returns true if there is an edge between
two vertices in constant time, the neighborhood query reports the set of all
adjacent vertices in time proportional to the degree of the queried vertex, and
the shortest path query returns a shortest path in time proportional to its
length, thus the running times of these queries are optimal. Towards showing
succinctness, we first show that at least bits
are necessary to represent any unlabeled interval graph with vertices,
answering an open problem of Yang and Pippenger [Proc. Amer. Math. Soc. 2017].
This is augmented by a data structure of size bits while
supporting not only the aforementioned queries optimally but also capable of
executing various combinatorial algorithms (like proper coloring, maximum
independent set etc.) on the input interval graph efficiently. Finally, we
extend our ideas to other variants of interval graphs, for example, proper/unit
interval graphs, k-proper and k-improper interval graphs, and circular-arc
graphs, and design succinct/compact data structures for these graph classes as
well along with supporting queries on them efficiently
Computing Covers under Substring Consistent Equivalence Relations
Covers are a kind of quasiperiodicity in strings. A string is a cover of
another string if any position of is inside some occurrence of in
. The shortest and longest cover arrays of have the lengths of the
shortest and longest covers of each prefix of , respectively. The literature
has proposed linear-time algorithms computing longest and shortest cover arrays
taking border arrays as input. An equivalence relation over strings
is called a substring consistent equivalence relation (SCER) iff
implies (1) and (2) for all . In this paper, we generalize the notion of covers for SCERs and prove
that existing algorithms to compute the shortest cover array and the longest
cover array of a string under the identity relation will work for any SCERs
taking the accordingly generalized border arrays.Comment: 16 page
Reconstructing phylogenies from noisy quartets in polynomial time with a high success probability
<p>Abstract</p> <p>Background</p> <p>In recent years, quartet-based phylogeny reconstruction methods have received considerable attentions in the computational biology community. Traditionally, the accuracy of a phylogeny reconstruction method is measured by simulations on synthetic datasets with known "true" phylogenies, while little theoretical analysis has been done. In this paper, we present a new model-based approach to measuring the accuracy of a quartet-based phylogeny reconstruction method. Under this model, we propose three efficient algorithms to reconstruct the "true" phylogeny with a high success probability.</p> <p>Results</p> <p>The first algorithm can reconstruct the "true" phylogeny from the input quartet topology set without quartet errors in <it>O</it>(<it>n</it><sup>2</sup>) time by querying at most (<it>n </it>- 4) log(<it>n </it>- 1) quartet topologies, where <it>n </it>is the number of the taxa. When the input quartet topology set contains errors, the second algorithm can reconstruct the "true" phylogeny with a probability approximately 1 - <it>p </it>in <it>O</it>(<it>n</it><sup>4 </sup>log <it>n</it>) time, where <it>p </it>is the probability for a quartet topology being an error. This probability is improved by the third algorithm to approximately <inline-formula><m:math name="1748-7188-3-1-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mfrac><m:mn>1</m:mn><m:mrow><m:mn>1</m:mn><m:mo>+</m:mo><m:msup><m:mi>q</m:mi><m:mn>2</m:mn></m:msup><m:mo>+</m:mo><m:mfrac><m:mn>1</m:mn><m:mn>2</m:mn></m:mfrac><m:msup><m:mi>q</m:mi><m:mn>4</m:mn></m:msup><m:mo>+</m:mo><m:mfrac><m:mn>1</m:mn><m:mrow><m:mn>16</m:mn></m:mrow></m:mfrac><m:msup><m:mi>q</m:mi><m:mn>5</m:mn></m:msup></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF"> MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqaIXaqmaeaacqaIXaqmcqGHRaWkcqWGXbqCdaahaaqabeaacqaIYaGmaaGaey4kaSYaaSaaaeaacqaIXaqmaeaacqaIYaGmaaGaemyCae3aaWbaaeqabaGaeGinaqdaaiabgUcaRmaalaaabaGaeGymaedabaGaeGymaeJaeGOnaydaaiabdghaXnaaCaaabeqaaiabiwda1aaaaaaaaa@3D5A@</m:annotation></m:semantics></m:math></inline-formula>, where <inline-formula><m:math name="1748-7188-3-1-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mrow><m:mi>q</m:mi><m:mo>=</m:mo><m:mfrac><m:mi>p</m:mi><m:mrow><m:mn>1</m:mn><m:mo>β</m:mo><m:mi>p</m:mi></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF"> MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyCaeNaeyypa0tcfa4aaSaaaeaacqWGWbaCaeaacqaIXaqmcqGHsislcqWGWbaCaaaaaa@3391@</m:annotation></m:semantics></m:math></inline-formula>, with running time of <it>O</it>(<it>n</it><sup>5</sup>), which is at least 0.984 when <it>p </it>< 0.05.</p> <p>Conclusion</p> <p>The three proposed algorithms are mathematically guaranteed to reconstruct the "true" phylogeny with a high success probability. The experimental results showed that the third algorithm produced phylogenies with a higher probability than its aforementioned theoretical lower bound and outperformed some existing phylogeny reconstruction methods in both speed and accuracy.</p
Higher levels of glutamate in the associative-striatum of subjects with prodromal symptoms of schizophrenia and patients with first-episode psychosis
The glutamatergic and dopaminergic systems are thought to be involved in the pathophysiology of schizophrenia. Their interaction has been widely documented and may have a role in the neurobiological basis of the disease. The aim of this study was to compare, using proton magnetic resonance spectroscopy (1H-MRS), glutamate levels in the precommissural dorsal-caudate (a dopamine-rich region) and the cerebellar cortex (negligible for dopamine) in the following: (1) 18 antipsychotic-naΓ―ve subjects with prodromal symptoms and considered to be at ultra high-risk for schizophrenia (UHR), (2) 18 antipsychotic-naΓ―ve first- episode psychosis patients (FEP), and (3) 40 age- and sex- matched healthy controls. All subjects underwent a 1H-MRS study using a 3Tesla scanner. Glutamate levels were quantified and corrected for the proportion of cerebrospinal fluid and percentage of gray matter in the voxel. The UHR and FEP groups showed higher levels of glutamate than controls, without differences between UHR and FEP. In the cerebellum, no differences were seen between the three groups. The higher glutamate level in the precommissural dorsal-caudate and not in the cerebellum of UHR and FEP suggests that a high glutamate level (a) precedes the onset of schizophrenia, and (b) is present in a dopamine-rich region previously implicated in the pathophysiology of schizophrenia.peer-reviewe
Range Minimum Query Indexes in Higher Dimensions
Range minimum queries (RMQs) are essential in many algorithmic procedures. The problem is to prepare a data structure on an array to allow for fast subsequent queries that find the minimum within a range in the array. We study the problem of designing indexing RMQ data structures which only require sub-linear space and access to the input array while querying. The RMQ problem in one-dimensional arrays is well understood with known indexing data structures achieving optimal space and query time. The two-dimensional indexing RMQ data structures have received the attention of researchers recently. There are also several solutions for the RMQ problem in higher dimensions. Yuan and Atallah [SODAβ10] designed a brilliant data structure of size O(N) which supports RMQs in a multi-dimensional array of size N in constant time for a constant number of dimensions. In this paper we consider the problem of designing indexing data structures for RMQs in higher dimensions. We design a data structure of size O(N) bits that supports RMQs in constant time for a constant number of dimensions. We also show how to obtain trade-offs between the space of indexing data structures and their query time.SCOPUS: cp.kinfo:eu-repo/semantics/publishe
- β¦